随着物联网(IOT)继续增长,确保依赖无线物联网设备的系统的安全性变得严重重要。最近介绍了基于深度学习的被动物理层发射机授权系统,因为它们适应了这些设备的有限计算和电源预算。这些系统已被证明在固定授权发射机集上培训和测试时提供出色的异常检测精度。然而,在实际部署中,由于授权的发射机变化,可能会出现需要添加和删除的发射机。在这种情况下,系统可能会长时间经历,因为培训潜在的深度学习模型通常是耗时的过程。在本文中,我们从信息检索中汲取灵感来解决这个问题:通过利用特征向量作为RF指纹,我们首先证明可以简化培训,以使用当地敏感散列(LSH)将这些特征向量索引到数据库中。然后,我们示出了可以在数据库上执行近似最近的邻居搜索,以执行与深度学习模型的准确性匹配的发射机授权,同时允许更快的再培训超过100倍。此外,在特征向量上使用维度降低技术,以表明我们的技术的授权延迟可以减少以接近基于深度学习的系统的方法。
translated by 谷歌翻译
The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned agent will perform if deployed in the target environment before actually deploying it and without the risk of overestimating its true performance. To achieve this, we introduce a framework for safe evaluation of offline learning using approximate high-confidence off-policy evaluation (HCOPE) to estimate the performance of offline policies during learning. In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping. A lower-bound estimate tells us how good a newly-learned target policy would perform before it is deployed in the real environment, and therefore allows us to decide when to deploy our learned policy.
translated by 谷歌翻译
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $\pi_e$. Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular state-action pair occurring under $\pi_e$ is very different from the probability of that same pair occurring in $\mathcal{D}$ (Voloshin et al. 2021, Fu et al. 2021). In this work, we propose to improve the accuracy of OPE estimators by projecting the high-dimensional state-space into a low-dimensional state-space using concepts from the state abstraction literature. Specifically, we consider marginalized importance sampling (MIS) OPE algorithms which compute state-action distribution correction ratios to produce their OPE estimate. In the original ground state-space, these ratios may have high variance which may lead to high variance OPE. However, we prove that in the lower-dimensional abstract state-space the ratios can have lower variance resulting in lower variance OPE. We then highlight the challenges that arise when estimating the abstract ratios from data, identify sufficient conditions to overcome these issues, and present a minimax optimization problem whose solution yields these abstract ratios. Finally, our empirical evaluation on difficult, high-dimensional state-space OPE tasks shows that the abstract ratios can make MIS OPE estimators achieve lower mean-squared error and more robust to hyperparameter tuning than the ground ratios.
translated by 谷歌翻译
Reinforcement Learning (RL) can solve complex tasks but does not intrinsically provide any guarantees on system behavior. For real-world systems that fulfill safety-critical tasks, such guarantees on safety specifications are necessary. To bridge this gap, we propose a verifiably safe RL procedure with probabilistic guarantees. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification, while randomizing the controller's inputs within a bounded set. Then, we use RL to improve the performance of this probabilistically verified, i.e. safe, controller and explore in the same bounded set around the controller's input as was randomized over in the verification step. Finally, we calculate probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficient for continuous action and state spaces and separates safety verification and performance improvement into two independent steps. We evaluate our approach on a safe evasion task where a robot has to evade a dynamic obstacle in a specific manner while trying to reach a goal. The results show that our verifiably safe RL approach leads to efficient learning and performance improvements while maintaining safety specifications.
translated by 谷歌翻译
Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a number of action plays. This problem is considered more challenging than the linear bandit problem, which can be viewed as a contextual bandit problem with a \emph{fixed} context. Surprisingly, in this paper, we show that the stochastic contextual problem can be solved as if it is a linear bandit problem. In particular, we establish a novel reduction framework that converts every stochastic contextual linear bandit instance to a linear bandit instance, when the context distribution is known. When the context distribution is unknown, we establish an algorithm that reduces the stochastic contextual instance to a sequence of linear bandit instances with small misspecifications and achieves nearly the same worst-case regret bound as the algorithm that solves the misspecified linear bandit instances. As a consequence, our results imply a $O(d\sqrt{T\log T})$ high-probability regret bound for contextual linear bandits, making progress in resolving an open problem in (Li et al., 2019), (Li et al., 2021). Our reduction framework opens up a new way to approach stochastic contextual linear bandit problems, and enables improved regret bounds in a number of instances including the batch setting, contextual bandits with misspecifications, contextual bandits with sparse unknown parameters, and contextual bandits with adversarial corruption.
translated by 谷歌翻译
It is well known that the performance of any classification model is effective if the dataset used for the training process and the test process satisfy some specific requirements. In other words, the more the dataset size is large, balanced, and representative, the more one can trust the proposed model's effectiveness and, consequently, the obtained results. Unfortunately, large-size anonymous datasets are generally not publicly available in biomedical applications, especially those dealing with pathological human face images. This concern makes using deep-learning-based approaches challenging to deploy and difficult to reproduce or verify some published results. In this paper, we suggest an efficient method to generate a realistic anonymous synthetic dataset of human faces with the attributes of acne disorders corresponding to three levels of severity (i.e. Mild, Moderate and Severe). Therefore, a specific hierarchy StyleGAN-based algorithm trained at distinct levels is considered. To evaluate the performance of the proposed scheme, we consider a CNN-based classification system, trained using the generated synthetic acneic face images and tested using authentic face images. Consequently, we show that an accuracy of 97,6\% is achieved using InceptionResNetv2. As a result, this work allows the scientific community to employ the generated synthetic dataset for any data processing application without restrictions on legal or ethical concerns. Moreover, this approach can also be extended to other applications requiring the generation of synthetic medical images. We can make the code and the generated dataset accessible for the scientific community.
translated by 谷歌翻译
In this paper, we present a new theoretical approach for enabling domain knowledge acquisition by intelligent systems. We introduce a hybrid model that starts with minimal input knowledge in the form of an upper ontology of concepts, stores and reasons over this knowledge through a knowledge graph database and learns new information through a Logic Neural Network. We study the behavior of this architecture when handling new data and show that the final system is capable of enriching its current knowledge as well as extending it to new domains.
translated by 谷歌翻译
传感器融合可以显着提高许多计算机视觉任务的性能。但是,传统的融合方法要么不是数据驱动的,也不能利用先验知识,也不能在给定数据集中找到规律性,或者它们仅限于单个应用程序。我们通过呈现一种新型深层分层变异自动编码器来克服这一缺点,称为FusionVae,可以作为许多融合任务的基础。我们的方法能够生成以多个嘈杂,遮挡或仅部分可见的输入图像来调节的各种图像样本。我们得出并优化了融合的条件对数似然的变化下限。为了彻底评估模型的融合功能,我们根据流行的计算机视觉数据集创建了三个新颖的图像融合数据集。在我们的实验中,我们表明FusionVae学习了与融合任务相关的汇总信息的表示。结果表明,我们的方法表现明显优于传统方法。此外,我们介绍了不同设计选择的优势和缺点。
translated by 谷歌翻译
在各种控制任务域中,现有控制器提供了基线的性能水平,虽然可能是次优的 - 应维护。依赖于国家和行动空间的广泛探索的强化学习(RL)算法可用于优化控制策略。但是,完全探索性的RL算法可能会在训练过程中降低低于基线水平的性能。在本文中,我们解决了控制政策的在线优化问题,同时最大程度地减少了遗憾的W.R.T基线政策绩效。我们提出了一个共同的仿制学习框架,表示乔尔。 JIRL中的学习过程假设了基线策略的可用性,并设计了两个目标\ textbf {(a)}利用基线的在线演示,以最大程度地减少培训期间的遗憾W.R.T的基线策略,\ textbf {(b) }最终超过了基线性能。 JIRL通过最初学习模仿基线策略并逐渐将控制从基线转移到RL代理来解决这些目标。实验结果表明,JIRR有效地实现了几个连续的动作空间域中的上述目标。结果表明,JIRL在最终性能中与最先进的算法相当,同时在所有提出的域中训练期间都会降低基线后悔。此外,结果表明,对于最先进的基线遗憾最小化方法,其基线后悔的减少因素最高为21美元。
translated by 谷歌翻译